Comparing Rating Scales and Preference Judgements in Language Evaluation

نویسندگان

  • Anja Belz
  • Eric Kow
چکیده

Rating-scale evaluations are common in NLP, but are problematic for a range of reasons, e.g. they can be unintuitive for evaluators, inter-evaluator agreement and self-consistency tend to be low, and the parametric statistics commonly applied to the results are not generally considered appropriate for ordinal data. In this paper, we compare rating scales with an alternative evaluation paradigm, preferencestrength judgement experiments (PJEs), where evaluators have the simpler task of deciding which of two texts is better in terms of a given quality criterion. We present three pairs of evaluation experiments assessing text fluency and clarity for different data sets, where one of each pair of experiments is a rating-scale experiment, and the other is a PJE. We find the PJE versions of the experiments have better evaluator self-consistency and interevaluator agreement, and a larger proportion of variation accounted for by system differences, resulting in a larger number of significant differences being found.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparing quantum versus Markov random walk models of judgements measured by rating scales

Quantum and Markov random walk models are proposed for describing how people evaluate stimuli using rating scales. To empirically test these competing models, we conducted an experiment in which participants judged the effectiveness of public health service announcements from either their own personal perspective or from the perspective of another person. The order of the self versus other judg...

متن کامل

Bias in the perception of phonetic detail in children's speech: A comparison of categorical and continuous rating scales.

Previous research has shown that continuous rating scales can be used to assess phonetic detail in children's productions, and could potentially be used to detect covert contrasts. Two experiments examined whether continuous rating scales have the additional benefit of being less susceptible to task-related biasing than categorical phonetic transcriptions. In both experiments, judgements of chi...

متن کامل

Developing an Analytic Scale for Scoring EFL Descriptive Writing

English language practitioners have long relied on intuition-based scales for rating EFL/ESL writing. As these scales lack an empirical basis, the scores they generate tend to be unreliable, which results in invalid interpretations. Given the significance of the genre of description and the fact that the relevant literature does not introduce any data-based analytic scales for rating EFL descri...

متن کامل

Developing Rating Scale Descriptors for Assessing the Stages of Writing Process: The Constructs Underlying Students' Writing Performances

The purpose of the present study is to develop appropriate scoring scales for each of the defined stages of the writing process, and also to determine to what extent these scoring scales can reliably and validly assess the performances of EFL learners in an academic writing task. Two hundred and two students’ writing samples were collected after a step-by-step process oriented essay writing ins...

متن کامل

Discrete vs. Continuous Rating Scales for Language Evaluation in NLP

Studies assessing rating scales are very common in psychology and related fields, but are rare in NLP. In this paper we assess discrete and continuous scales used for measuring quality assessments of computergenerated language. We conducted six separate experiments designed to investigate the validity, reliability, stability, interchangeability and sensitivity of discrete vs. continuous scales....

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010